Readme for the Pathways Corpus
==============================

How is the data stored 
=======================

All datasets are stored as UTF-8 encoded, tab-separated value files.


What data is included
======================

pathways_corpus_be.tsv, Belgium, 2010-2014, source: https://www.lachambre.be/
pathways_corpus_de.tsv, Germany, 2009-2013, source: https://www.bundestag.de/
pathways_corpus_el.tsv, Greece,2015, source: https://www.hellenicparliament.gr/ 
pathways_corpus_es.tsv, Spain, 2011-2015, source: http://www.congreso.es/
pathways_corpus_fr.tsv, France, 2007-2012, source: http://www.assemblee-nationale.fr/
pathways_corpus_it.tsv, Italy, 2008-2013, source: http://dati.camera.it/  
pathways_corpus_nl.tsv, Netherlands, 2010-2012, source: https://zoek.officielebekendmakingen.nl/
pathways_corpus_uk.tsv, United Kingdom, 2010-2015, source: https://www.theyworkforyou.com/


What variables are included
===========================

qid: identifier of written question
qdate: date of written question
name: name of MP
mp_id: identifier of MP
party: party of MP
cio: Citizen of Immigrant Origin status of MP
title: title of written question
area: subject area of written question
department: adressed department of written question
prestring: prestring of written question
text: text of written question

Please note that not every dataset includes all variables due to country-specific differences.


How to link the Pathways Corpus with other Pathways data
========================================================

Data can be merged with datasets from other work packages, e.g. WP1, by variable mp_id, 
which is a unique and consistent identifier for Members of Parliament in the Pathways Project.


Documentation of retrieving Written Questions
==============================================

All .html files contain (most of the) Python code that was used to retrieve written questions from various sources.
Please keep in mind that this code is unlikely to work in the future, as the corresponding data sources change frequently
and not all of them are available publicly. 
One example: For Spain, we retrieved a DVD containing text files with written questions.